Multiple Imputation for Sharing Precise Geographies in Public Use Data.

نویسندگان

  • Hao Wang
  • Jerome P Reiter
چکیده

When releasing data to the public, data stewards are ethically and often legally obligated to protect the confidentiality of data subjects' identities and sensitive attributes. They also strive to release data that are informative for a wide range of secondary analyses. Achieving both objectives is particularly challenging when data stewards seek to release highly resolved geographical information. We present an approach for protecting the confidentiality of data with geographic identifiers based on multiple imputation. The basic idea is to convert geography to latitude and longitude, estimate a bivariate response model conditional on attributes, and simulate new latitude and longitude values from these models. We illustrate the proposed methods using data describing causes of death in Durham, North Carolina. In the context of the application, we present a straightforward tool for generating simulated geographies and attributes based on regression trees, and we present methods for assessing disclosure risks with such simulated data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

چند رویکرد برخورد با مقادیر گمشده‌ متغیرهای کمی و بررسی اثر آنها بر نتایج حاصل از یک کارآزمایی‌ بالینی

Background and Objectives: A major challenge that affects the longitudinal studies is the problem of missing data. Missing in the data may result in the loss of part of the information which reduces the accuracy of the estimator and obtain the results will be biased and inaccurate. Therefore, it is necessary to evaluate the missing data mechanism from a longitudinal research and to consider thi...

متن کامل

Selection of Variables that Influence Drug Injection in Prison: Comparison of Methods with Multiple Imputed Data Sets

Background: Prisoners, compared to the general population, are at greater risk of infection. Drug injection is the main route of HIV transmission, in particular in Iran. What would be of interest is to determine variables that govern drug injection among prisoners. However, one of the issues that challenge model building is incomplete national data sets. In this paper, we addressed the process ...

متن کامل

An Empirical Comparison of Performance of the Unified Approach to Linearization of Variance Estimation after Imputation with Some Other Methods

Imputation is one of the most common methods to reduce item non_response effects. Imputation results in a complete data set, and then it is possible to use naϊve estimators. After using most of common imputation methods, mean and total (imputation estimators) are still unbiased. However their variances (imputation variances) are underestimated by naϊve variance estimators. Sampling mechanism an...

متن کامل

Accuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)

Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...

متن کامل

Investigating the relationship between knowledge sharing and science production from the viewpoints of faculty members of Knowledge and Information Science at Tehran's public universities

Considering the importance of knowledge sharing as the most important factor in the science production process, this study investigated the relationship between these two variables from viewpoints of faculty members of Knowledge and Information Science at public universities in Tehran.  This study was an applied and descriptive-correlation study. The statistical population consisted of 48 facul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • The annals of applied statistics

دوره 6 1  شماره 

صفحات  -

تاریخ انتشار 2012